Simulation results 5

Sequential design with early stopping (restricted action set) - risk based

Published

May 8, 2025

Modified

May 8, 2025

Load simulation results
# Each input file corresponds to the results from a single simulation
# scenario/configuration.
# Load all the files into a single list.

# files of interest
sim_lab <- "sim05-13"

flist <- list.files(paste0("data/", sim_lab), pattern = "sim05")
toks <- list()
l <- list()
i <- 1
for(i in 1:length(flist)){
  l[[i]] <- qs::qread(file.path(paste0("data/", sim_lab), flist[i]))
  toks[[i]] <-  unlist(tstrsplit(flist[i], "[-.]"))
}

Introduction

In the present design, the model is used to compute unit level risk (probability) and assesses decisions based on risk difference for the intervention comparisons by domain. Log-odds scale had limited interpretability for clinical users and is inconsistent in absolute terms across strata with varying baseline rates. It is also useful to explore what an absolute perspective on effectiveness translates to in terms of operating characteristics. Simulation parameters continue to be expressed as log-odds-ratios. This makes the simulation process simpler. Results are presented on both the odds and risk scale. Aim is to give us a better intuition and transparency on the magnitude of effects we are contemplating and determine if these are reasonable assumptions.

We have a single, large, multivariable logistic regression model. One of the sets of parameters accounts for clinician preference for revision type in order to achieve conditional exchangeability across groups. At each interim, we assess the posterior and if a decision threshold is met, we act. For example, if a superiority decision is reached in one of the domains for which this decision type is relevant, then we consider that domain dealt with and all subsequent participants are assigned to receive the superior intervention. We can (and presently do) continue to update the posterior inference for the comparison that has stopped in subsequent interim analyses until we get to the point where all questions have been answered in all domains, at which point the trial will stop.

The priors were as follows:

  • Reference log-odds of response: logistic distribution, mean 0 and scale 0.47
  • Silo effects: normal distribution, mean 0 and scale 1
  • Joint effects: normal distribution, mean 0 and scale 1
  • Preference effects: normal distribution, mean 0 and scale 1
  • Treatment effects: normal distribution, mean 0 and scale 1

Superiority and non-inferiority are applicable to some domains and not others, however, we define reference and threshold values for all domains, just in case.

For the superiority decision, a reference value of 0 was used and the probability thresholds were:

  • 0.94 for surgical domain
  • 0.98 for antibiotic duration domain
  • 0.98 for extended prophylaxis domain
  • 0.995 for antibiotic choice domain

For the futility decision (in relation to superiority) a reference value of 0.05 was used and the probability thresholds were:

  • 0.3 for surgical domain
  • 0.25 for antibiotic duration domain
  • 0.25 for extended prophylaxis domain
  • 0.25 for antibiotic choice domain

The above means, for example, that if the probability that the risk difference is greater than 0.05 is less than 0.3 in the surgical domain comparison, then we say the superiority goal is futile.

For the ni decision, a reference value of -0.05 was used and the probability thresholds were:

  • 0.98 for surgical domain
  • 0.925 for antibiotic duration domain
  • 0.98 for extended prophylaxis domain
  • 0.98 for antibiotic choice domain

The above means, for example, that if the probability that the risk difference is greater than -0.05 is greater than 0.925, in the antibiotic duration domain, then we will say the intervention is non-inferior.

The futility decision (in relation to non-inferiority) has a reference value of 0 and the probability thresholds were:

  • 0.25 for surgical domain
  • 0.1 for antibiotic duration domain
  • 0.25 for extended prophylaxis domain
  • 0.25 for antibiotic choice domain

This means, for example, that if the probability that the risk difference is greater than 0 is less than 0.1, in the antibiotic duration domain, then we say the non-inferiority goal is futile.

Figure 1 attempts to put the superiority rules into pictures based on possible scenarios for assessment of the posterior risk difference for an arbitrary domain where superiority is being assessed. The approach, reference values and thresholds apply to all domains where superiority is assessed.

Figure 1: Visualisation of decision rule scenarios for superiority

Analogously, Figure 2 puts the non-inferiority rule into pictures based on possible scenarios for assessment of the posterior risk difference for an arbitrary domain where non-inferiority is being assessed. The approach, reference values and thresholds apply to all domains.

Figure 2: Visualisation of decision rule scenarios for non-inferiority

For this set of simulations, the number of simulated trials per scenario was 1000, the simulation label is sim05-13.

Simulation results

Table 1 shows the cumulative probability of a superiority decision across each of the scenarios simulated (the same information is shown in Figure 3). Operating characteristics are shown only for the relevant domains and the futility of a superiority decision is included in parentheses.

Notes (last edit 2025-04-29):

  1. The ‘average’ surgical revision effect reaches about 86% when both one-stage and two-stage have a moderate effects, i.e. when both of these procedures increase the log-odds or response by the same \(\log(1.75)\). When one-stage or two-stage show an effect (the other having a zero effect) the weighted average effect of revision is lower and hence the power is lower. The decisions are based on the aggregated effect of both revision types, not the methods selected by the clinician for revision.
  2. With the lower threshold value for superiority in the surgical domain, there is a commensurate increase in the type-i assertion probability.
  3. AB extended prophylaxis receives entrants from acute, late and chronic silos and hence has more data to work with, leading to a higher overall cumulative probability of stopping. Ditto with the AB choice domain. So, in general, these have better overall power when effects are present, when contrasted with the surgical domain.
  4. For the 90% power example, the effect (OR) sizes by domain are 1.87, 1.45, 1.73, 1.6 for surgical, ab duration, extended prophylaxis and choice respectively.
Code
d_tbl_1_cur <- d_tbl_1[quant %in% c("sup", "fut_sup") & domain %in% c(1, 3, 4), .SD]
setorderv(d_tbl_1_cur, cols = c("scenario", "domain", "analys", "quant"), 
          order = c(1, 1, 1, -1))
d_tbl_1_cur <- dcast(d_tbl_1_cur, scenario + desc + domain ~ quant + N, value.var = "pr_val")
d_tbl_1_cur <- d_tbl_1_cur[, .SD, .SDcols = !c("scenario")]

d_tbl_1_cur[, domain := factor(domain, 
                               levels = c(1, 3, 4), 
                               labels = c("Surgical", "AB Ext-proph", "AB Choice"))]

g_tbl <- d_tbl_1_cur |> 
  gt(groupname_col = "desc") |> 
  gt::text_transform(
    locations = cells_row_groups(),
    fn = function(x) {
      lapply(x, function(x) {
        gt::md(paste0("*", x, "*"))
      })
    }
  ) |>
  cols_align(
    columns = 1,
    align = "left"
  )  |> 
  cols_align(
    columns = 2:ncol(d_tbl_1_cur),
    align = "center"
  )  |> 
  cols_merge(
    columns = c("sup_500", "fut_sup_500"
                ),
    pattern = "<<{1}>><< ({2})>>"
  )   |> 
  cols_merge(
    columns = c("sup_1000", "fut_sup_1000"
                ),
    pattern = "<<{1}>><< ({2})>>"
  ) |> 
  cols_merge(
    columns = c("sup_1500", "fut_sup_1500"
                ),
    pattern = "<<{1}>><< ({2})>>"
  )  |> 
  cols_merge(
    columns = c("sup_2000", "fut_sup_2000"
                ),
    pattern = "<<{1}>><< ({2})>>"
  )  |> 
  cols_merge(
    columns = c("sup_2500", "fut_sup_2500"
                ),
    pattern = "<<{1}>><< ({2})>>"
  ) |>
  tab_spanner(
    label = md("Cumulative probability of **superiority** (futility) decision"),
    columns = 3:ncol(d_tbl_1_cur)
  )  |>
  cols_label(
    domain = "Domain",
    sup_500 = html("500"),
    sup_1000 = html("1000"),
    sup_1500 = html("1500"),
    sup_2000 = html("2000"),
    sup_2500 = html("2500")
  ) |>
  tab_options(
    table.font.size = "70%"
  ) |>
  fmt_number(decimals = 3, drop_trailing_zeros = TRUE)

g_tbl
Domain
Cumulative probability of superiority (futility) decision
500 1000 1500 2000 2500
Null effect in all domains
Surgical 0.041 (0.539) 0.057 (0.683) 0.068 (0.74) 0.077 (0.76) 0.081 (0.774)
AB Ext-proph 0.027 (0.432) 0.037 (0.576) 0.043 (0.654) 0.053 (0.712) 0.06 (0.756)
AB Choice 0.002 (0.572) 0.009 (0.766) 0.015 (0.865) 0.02 (0.913) 0.021 (0.938)
Moderate (OR 1.75) surgical revision effect (both one and two-stage)
Surgical 0.523 (0.039) 0.689 (0.05) 0.763 (0.054) 0.798 (0.056) 0.813 (0.056)
AB Ext-proph 0.023 (0.473) 0.041 (0.668) 0.051 (0.765) 0.055 (0.825) 0.06 (0.863)
AB Choice 0 (0.583) 0.008 (0.771) 0.012 (0.866) 0.015 (0.916) 0.016 (0.945)
Moderate (OR 1.75) surgical revision effect (one-stage only)
Surgical 0.12 (0.275) 0.213 (0.356) 0.253 (0.392) 0.289 (0.409) 0.31 (0.419)
AB Ext-proph 0.019 (0.444) 0.037 (0.616) 0.047 (0.707) 0.062 (0.76) 0.065 (0.798)
AB Choice 0.007 (0.565) 0.01 (0.755) 0.015 (0.84) 0.02 (0.89) 0.022 (0.923)
Moderate (OR 1.75) surgical revision effect (two-stage only)
Surgical 0.223 (0.149) 0.37 (0.202) 0.455 (0.225) 0.485 (0.235) 0.507 (0.24)
AB Ext-proph 0.019 (0.453) 0.038 (0.614) 0.05 (0.71) 0.06 (0.775) 0.066 (0.816)
AB Choice 0.006 (0.565) 0.009 (0.757) 0.01 (0.861) 0.014 (0.912) 0.015 (0.942)
Moderate (OR 1.75) antibiotic duration 6wk effect
Surgical 0.039 (0.521) 0.062 (0.644) 0.079 (0.688) 0.082 (0.707) 0.084 (0.717)
AB Ext-proph 0.014 (0.475) 0.024 (0.633) 0.034 (0.716) 0.043 (0.764) 0.045 (0.809)
AB Choice 0.004 (0.587) 0.008 (0.771) 0.013 (0.869) 0.013 (0.927) 0.017 (0.95)
Moderate (OR 1.75) antibiotic ext-proph 12wk effect
Surgical 0.035 (0.519) 0.054 (0.665) 0.074 (0.718) 0.08 (0.746) 0.088 (0.758)
AB Ext-proph 0.312 (0.052) 0.534 (0.061) 0.676 (0.064) 0.757 (0.067) 0.811 (0.07)
AB Choice 0.003 (0.566) 0.007 (0.766) 0.012 (0.859) 0.014 (0.909) 0.015 (0.938)
Moderate (OR 1.75) antibiotic choice rifampacin effect
Surgical 0.027 (0.555) 0.049 (0.689) 0.058 (0.732) 0.064 (0.755) 0.071 (0.768)
AB Ext-proph 0.012 (0.468) 0.021 (0.611) 0.033 (0.697) 0.039 (0.756) 0.045 (0.792)
AB Choice 0.426 (0.014) 0.788 (0.018) 0.936 (0.018) 0.978 (0.018) 0.982 (0.018)
Moderate (OR 1.75) effects in all domains
Surgical 0.589 (0.018) 0.784 (0.023) 0.855 (0.025) 0.875 (0.026) 0.882 (0.026)
AB Ext-proph 0.28 (0.058) 0.604 (0.065) 0.821 (0.065) 0.89 (0.065) 0.917 (0.065)
AB Choice 0.416 (0.019) 0.762 (0.021) 0.91 (0.021) 0.959 (0.021) 0.975 (0.021)
Large (OR 2.5) surgical revision effect (both one and two-stage)
Surgical 0.909 (0) 0.973 (0) 0.985 (0) 0.991 (0) 0.991 (0)
AB Ext-proph 0.02 (0.493) 0.034 (0.698) 0.045 (0.809) 0.051 (0.867) 0.056 (0.898)
AB Choice 0.003 (0.568) 0.007 (0.773) 0.009 (0.88) 0.011 (0.917) 0.013 (0.939)
Large (OR 2.5) surgical revision effect (one-stage only)
Surgical 0.249 (0.096) 0.408 (0.123) 0.49 (0.144) 0.536 (0.154) 0.563 (0.161)
AB Ext-proph 0.016 (0.48) 0.032 (0.647) 0.047 (0.752) 0.053 (0.824) 0.058 (0.865)
AB Choice 0.003 (0.578) 0.008 (0.776) 0.011 (0.857) 0.015 (0.915) 0.016 (0.943)
Large (OR 2.5) surgical revision effect (two-stage only)
Surgical 0.458 (0.035) 0.672 (0.047) 0.745 (0.053) 0.78 (0.053) 0.796 (0.054)
AB Ext-proph 0.02 (0.52) 0.033 (0.712) 0.047 (0.796) 0.05 (0.847) 0.055 (0.887)
AB Choice 0.006 (0.554) 0.007 (0.763) 0.01 (0.847) 0.011 (0.905) 0.011 (0.931)
Large (OR 2.5) antibiotic duration 6wk effect
Surgical 0.036 (0.531) 0.062 (0.646) 0.071 (0.677) 0.078 (0.692) 0.08 (0.702)
AB Ext-proph 0.02 (0.463) 0.031 (0.622) 0.037 (0.701) 0.041 (0.753) 0.046 (0.786)
AB Choice 0.005 (0.551) 0.013 (0.741) 0.017 (0.836) 0.024 (0.891) 0.025 (0.924)
Large (OR 2.5) antibiotic ext-proph 12wk effect
Surgical 0.048 (0.477) 0.072 (0.592) 0.09 (0.63) 0.09 (0.651) 0.093 (0.655)
AB Ext-proph 0.694 (0.003) 0.913 (0.003) 0.969 (0.004) 0.985 (0.004) 0.993 (0.004)
AB Choice 0.007 (0.546) 0.016 (0.739) 0.02 (0.844) 0.021 (0.897) 0.022 (0.928)
Large (OR 2.5) antibiotic choice rifampacin effect
Surgical 0.022 (0.525) 0.033 (0.67) 0.043 (0.722) 0.049 (0.751) 0.055 (0.768)
AB Ext-proph 0.025 (0.467) 0.035 (0.619) 0.045 (0.7) 0.049 (0.753) 0.052 (0.785)
AB Choice 0.888 (0) 0.997 (0) 1 (0) 1 (0) 1 (0)
Large (OR 2.5) effects in all domains
Surgical 0.945 (0.001) 0.981 (0.001) 0.987 (0.001) 0.991 (0.001) 0.991 (0.001)
AB Ext-proph 0.555 (0.012) 0.908 (0.013) 0.976 (0.014) 0.986 (0.014) 0.987 (0.014)
AB Choice 0.811 (0) 0.985 (0) 0.999 (0) 1 (0) 1 (0)
Moderate (OR 1.75) surgical revision effect (both one and two-stage) and antibiotic duration
Surgical 0.557 (0.024) 0.768 (0.028) 0.832 (0.03) 0.862 (0.031) 0.868 (0.032)
AB Ext-proph 0.306 (0.046) 0.62 (0.053) 0.827 (0.056) 0.907 (0.057) 0.936 (0.057)
AB Choice 0.006 (0.565) 0.01 (0.766) 0.012 (0.858) 0.013 (0.908) 0.016 (0.934)
Effects to achieve 90% power
Surgical 0.675 (0.008) 0.852 (0.013) 0.893 (0.014) 0.909 (0.016) 0.918 (0.017)
AB Ext-proph 0.254 (0.062) 0.615 (0.072) 0.804 (0.073) 0.879 (0.075) 0.915 (0.075)
AB Choice 0.262 (0.035) 0.608 (0.045) 0.79 (0.049) 0.879 (0.052) 0.928 (0.052)
Table 1: Cumulative probability of superiority (futility in parentheses) decision at each interim (shown by total enrolment by interim)
Code
d_fig <- d_tbl_1[quant %in% c("sup", "fut_sup") & domain %in% c(1, 3, 4), .SD]
setorderv(d_fig, cols = c("scenario", "domain", "analys", "quant"), 
          order = c(1, 1, 1, -1))

d_fig[, domain := factor(domain, 
                         levels = c(1, 3, 4), 
                         labels = c("Surgical", "AB Ext-proph", "AB Choice"))]

d_fig[, quant := factor(quant, 
                        levels = c("sup", "fut_sup"), 
                        labels = c("Superiority", "Futility"))]

d_fig[, desc := factor(desc, 
                        levels = unique(d_fig$desc), 
                        labels = unique(d_fig$desc))]


ggplot(d_fig, aes(x = N, y = pr_val, group = quant, col = quant)) +
  geom_line(lwd = 0.25) +
  scale_y_continuous("", breaks = seq(0, 1, by = 0.2)) +
  scale_color_discrete("") +
  facet_grid(desc ~ domain) + 
  # facet_grid2(desc ~ domain, render_empty = FALSE)
  theme_minimal() +
  theme(
    legend.position = "bottom",
    strip.text.y.right = element_text(angle = 0,
                                      hjust = 0,
                                      vjust = 0.2,
                                      size = 4),
    strip.text.x.top = element_text(size = 4),
    panel.grid.minor = element_blank(),
    panel.grid.major = element_line(color = "grey",
                                  linewidth = 0.1,
                                  linetype = 1),
    axis.text.y = element_text(size = 4), 
    axis.text.x = element_text(size = 4), 
    axis.title.x = element_text(size = 5),
    legend.text = element_text(size = 4)
  )
Figure 3: Cumulative probabilities for superiority assessments

Table 2 shows the cumulative probability of a non-inferiority decision with futility shown in parentheses (the same information is shown in Figure 5). The results are only shown for the domains for which non-inferiority is evaluated.

Notes:

  1. The “Null effect in all domains” is actually a bit of a misnomer as the true null with regards to the NI decision would be at the NI margin, whereas the scenario refers to the setting where all effects are set to zero. Thus an inflation over the usual type-i assertion probability is to be expected.
  2. In this set of results the cumalative probability of claiming NI by 2500 is 0.76 when the effect size is OR 1.75 (contrasting with 0.67 from the last set of simulation results).
  3. In the null case (when the 6 week and 12 week response are effectively identical) there is only a ~10% cumulative probability of declaring NI. This is purely due to the thresholds we have selected. If equivalence is what we are actually thinking about, then we could potentially look towards evaluating that instead of non-inferiority (see the Figure 4 below).
Figure 4: Decision options based on posterior
  1. The higher power in the scenario where all domains have an effect arises because the surgical domain does not get shut down so you have more participants entering this AB duration domain.
Code
d_tbl_1_cur <- d_tbl_1[quant %in% c("ni", "fut_ni") & domain %in% c(2), .SD]
setorderv(d_tbl_1_cur, cols = c("scenario", "domain", "analys", "quant"), 
          order = c(1, 1, 1, -1))
d_tbl_1_cur <- dcast(d_tbl_1_cur, scenario + desc + domain ~ quant + N, value.var = "pr_val")
d_tbl_1_cur <- d_tbl_1_cur[, .SD, .SDcols = !c("scenario")]

d_tbl_1_cur[, domain := factor(domain, 
                               levels = c(2), 
                               labels = c("AB Duration"))]

g_tbl <- d_tbl_1_cur |> 
  gt(groupname_col = "desc") |> 
  gt::text_transform(
    locations = cells_row_groups(),
    fn = function(x) {
      lapply(x, function(x) {
        gt::md(paste0("*", x, "*"))
      })
    }
  ) |>
  cols_align(
    columns = 1,
    align = "left"
  )  |> 
  cols_align(
    columns = 2:ncol(d_tbl_1_cur),
    align = "center"
  )  |> 
  cols_merge(
    columns = c("ni_500", "fut_ni_500"
                ),
    pattern = "<<{1}>><< ({2})>>"
  )   |> 
  cols_merge(
    columns = c("ni_1000", "fut_ni_1000"
                ),
    pattern = "<<{1}>><< ({2})>>"
  ) |> 
  cols_merge(
    columns = c("ni_1500", "fut_ni_1500"
                ),
    pattern = "<<{1}>><< ({2})>>"
  )  |> 
  cols_merge(
    columns = c("ni_2000", "fut_ni_2000"
                ),
    pattern = "<<{1}>><< ({2})>>"
  )  |> 
  cols_merge(
    columns = c("ni_2500", "fut_ni_2500"
                ),
    pattern = "<<{1}>><< ({2})>>"
  ) |>
  tab_spanner(
    label = md("Cumulative probability of **NI** (futility) decision"),
    columns = 3:ncol(d_tbl_1_cur)
  )  |>
  cols_label(
    domain = "Domain",
    ni_500 = html("500"),
    ni_1000 = html("1000"),
    ni_1500 = html("1500"),
    ni_2000 = html("2000"),
    ni_2500 = html("2500")
  ) |>
  tab_options(
    table.font.size = "70%"
  ) |>
  fmt_number(decimals = 3, drop_trailing_zeros = TRUE)

g_tbl
Domain
Cumulative probability of NI (futility) decision
500 1000 1500 2000 2500
Null effect in all domains
AB Duration 0.137 (0.074) 0.225 (0.113) 0.285 (0.146) 0.325 (0.173) 0.368 (0.19)
Moderate (OR 1.75) surgical revision effect (both one and two-stage)
AB Duration 0.129 (0.091) 0.254 (0.14) 0.334 (0.177) 0.396 (0.211) 0.441 (0.229)
Moderate (OR 1.75) surgical revision effect (one-stage only)
AB Duration 0.141 (0.095) 0.228 (0.153) 0.311 (0.173) 0.378 (0.192) 0.419 (0.208)
Moderate (OR 1.75) surgical revision effect (two-stage only)
AB Duration 0.133 (0.084) 0.245 (0.133) 0.313 (0.166) 0.362 (0.199) 0.413 (0.215)
Moderate (OR 1.75) antibiotic duration 6wk effect
AB Duration 0.466 (0.006) 0.688 (0.01) 0.79 (0.012) 0.867 (0.012) 0.908 (0.012)
Moderate (OR 1.75) antibiotic ext-proph 12wk effect
AB Duration 0.147 (0.07) 0.224 (0.105) 0.278 (0.136) 0.323 (0.158) 0.352 (0.174)
Moderate (OR 1.75) antibiotic choice rifampacin effect
AB Duration 0.136 (0.097) 0.211 (0.137) 0.266 (0.164) 0.301 (0.191) 0.333 (0.212)
Moderate (OR 1.75) effects in all domains
AB Duration 0.443 (0.018) 0.737 (0.02) 0.878 (0.02) 0.936 (0.021) 0.957 (0.021)
Large (OR 2.5) surgical revision effect (both one and two-stage)
AB Duration 0.149 (0.082) 0.275 (0.133) 0.364 (0.167) 0.43 (0.198) 0.489 (0.211)
Large (OR 2.5) surgical revision effect (one-stage only)
AB Duration 0.152 (0.104) 0.241 (0.15) 0.328 (0.178) 0.4 (0.211) 0.452 (0.228)
Large (OR 2.5) surgical revision effect (two-stage only)
AB Duration 0.132 (0.078) 0.249 (0.133) 0.336 (0.169) 0.396 (0.193) 0.44 (0.215)
Large (OR 2.5) antibiotic duration 6wk effect
AB Duration 0.691 (0.001) 0.874 (0.002) 0.948 (0.002) 0.972 (0.002) 0.988 (0.002)
Large (OR 2.5) antibiotic ext-proph 12wk effect
AB Duration 0.131 (0.083) 0.216 (0.123) 0.28 (0.161) 0.316 (0.191) 0.355 (0.213)
Large (OR 2.5) antibiotic choice rifampacin effect
AB Duration 0.134 (0.089) 0.203 (0.13) 0.274 (0.168) 0.322 (0.194) 0.355 (0.207)
Large (OR 2.5) effects in all domains
AB Duration 0.612 (0.003) 0.919 (0.003) 0.983 (0.003) 0.996 (0.003) 0.996 (0.003)
Moderate (OR 1.75) surgical revision effect (both one and two-stage) and antibiotic duration
AB Duration 0.137 (0.078) 0.254 (0.13) 0.335 (0.163) 0.403 (0.186) 0.462 (0.206)
Effects to achieve 90% power
AB Duration 0.325 (0.03) 0.6 (0.035) 0.77 (0.036) 0.866 (0.037) 0.904 (0.037)
Table 2: Cumulative probability of NI (futility in parentheses) decision at each interim (shown by total enrolment by interim)
Code
d_fig <- d_tbl_1[quant %in% c("ni", "fut_ni") & domain %in% c(2), .SD]
setorderv(d_fig, cols = c("scenario", "domain", "analys", "quant"), 
          order = c(1, 1, 1, -1))

d_fig[, domain := factor(domain, 
                               levels = c(2), 
                               labels = c("AB Duration"))]

d_fig[, quant := factor(quant, 
                        levels = c("ni", "fut_ni"), 
                        labels = c("NI", "Futility for NI"))]

d_fig[, desc := factor(desc, 
                        levels = unique(d_fig$desc), 
                        labels = unique(d_fig$desc))]


ggplot(d_fig, aes(x = N, y = pr_val, group = quant, col = quant)) +
  geom_line(lwd = 0.25) +
  scale_y_continuous("", breaks = seq(0, 1, by = 0.2)) +
  scale_color_discrete("") +
  facet_grid(desc ~ domain, space = "free") +
  
  # facet_manual(
  #   . ~ desc, design = matrix(1:17, ncol = 1),
  #   widths = unit(3, "cm")
  # ) +
  theme_minimal() +
  theme(
    legend.position = "bottom",
    strip.text.y.right = element_text(angle = 0,
                                      hjust = 0,
                                      vjust = 0.2,
                                      size = 4),
    strip.text.x.top = element_text(size = 4),
    panel.grid.minor = element_blank(),
    panel.grid.major = element_line(color = "grey",
                                  linewidth = 0.1,
                                  linetype = 1),
    axis.text.y = element_text(size = 4), 
    axis.text.x = element_text(size = 4), 
    axis.title.x = element_text(size = 5),
    legend.text = element_text(size = 4)
  )
Figure 5: Cumulative probabilities for NI assessments

Table 3 shows the number of enrolments when any stopping decision is made (including reaching the maximum 2500 sample size).

Notes

  1. This is the total number of enrolments that are expected to have occurred, not how many are contributing to the inference in a given domain.
Code
d_tbl_2_cur <- d_tbl_2[, .(N_mu = mean(N_stopped)), keyby = .(scenario, desc, domain)]
d_tbl_2_cur <- dcast(d_tbl_2_cur, scenario + desc ~ domain, value.var = "N_mu")
d_tbl_2_cur <- d_tbl_2_cur[, .SD, .SDcols = !c("scenario")]

g_tbl <- d_tbl_2_cur |> 
  gt(groupname_col = "desc") |> 
  gt::text_transform(
    locations = cells_row_groups(),
    fn = function(x) {
      lapply(x, function(x) {
        gt::md(paste0("*", x, "*"))
      })
    }
  ) |>
  cols_align(
    columns = 1,
    align = "left"
  )  |> 
  cols_align(
    columns = 2:ncol(d_tbl_2_cur),
    align = "center"
  )  |> 
  tab_spanner(
    label = html("Expected number of total enrolments to hit stopping rule by domain"),
    columns = 2:ncol(d_tbl_2_cur)
  )  |>
  cols_label(
    `1` = "Surgical",
    `2` = "AB Duration",
    `3` = "AB Ext-proph",
    `4` = "AB choice"
  ) |>
  tab_options(
    table.font.size = "70%"
  ) |>
  fmt_number(decimals = 0, drop_trailing_zeros = TRUE)

g_tbl
Expected number of total enrolments to hit stopping rule by domain
Surgical AB Duration AB Ext-proph AB choice
Null effect in all domains
1,025 1,761 1,233 920
Moderate (OR 1.75) surgical revision effect (both one and two-stage)
1,018 1,634 1,050 914
Moderate (OR 1.75) surgical revision effect (one-stage only)
1,354 1,666 1,155 949
Moderate (OR 1.75) surgical revision effect (two-stage only)
1,336 1,682 1,140 933
Moderate (OR 1.75) antibiotic duration 6wk effect
1,099 1,074 1,148 904
Moderate (OR 1.75) antibiotic ext-proph 12wk effect
1,065 1,780 1,238 932
Moderate (OR 1.75) antibiotic choice rifampacin effect
1,038 1,748 1,182 902
Moderate (OR 1.75) effects in all domains
907 964 1,076 936
Large (OR 2.5) surgical revision effect (both one and two-stage)
571 1,601 996 916
Large (OR 2.5) surgical revision effect (one-stage only)
1,404 1,620 1,074 918
Large (OR 2.5) surgical revision effect (two-stage only)
1,082 1,657 988 948
Large (OR 2.5) antibiotic duration 6wk effect
1,104 754 1,166 962
Large (OR 2.5) antibiotic ext-proph 12wk effect
1,178 1,750 712 955
Large (OR 2.5) antibiotic choice rifampacin effect
1,095 1,743 1,155 558
Large (OR 2.5) effects in all domains
546 739 762 602
Moderate (OR 1.75) surgical revision effect (both one and two-stage) and antibiotic duration
936 1,657 1,066 931
Effects to achieve 90% power
812 1,150 1,086 1,141
Table 3: Expected number of enrolments to hit any stopping rule (including reaching maximum sample size)

Figure 6 shows the number of participants entering into each of the randomised comparisons by domain and scenario.

The expected values are calculated by extracting the number of participants entering in each analysis and taking the cumulative sum of these, restricted to the relevant strata. For example, the domain 1 (surgical) expected values are based on the participants in the late acute silo that receive randomised surgical intervention. Similarly, the domain 2 (antibiotic duration) expected values are based on the participants across all silos that received one-stage revision.

If a decision was made for a single domain then subsequent enrolments would be assigned to the relevant arm. For example, if a superiority decision was made for domain 1 and the trial was ongoing then all subsequent participants are assigned to the superior intervention. Finally, if a decision was made for all research questions, then the trial would stop early and we use LOCF to propagate the sample size forward to subsequent analyses before computing the expected cumulative numbers by treatment arm.

Code
d_fig <- d_N_by_arm[, .(mu_N = mean(N)), keyby = .(desc, domain, arm, N_enrol)]
d_fig[, desc := factor(
  desc,
  levels = unique(d_N_by_arm$desc))]
  
d_fig[, arm := factor(arm)]
ggplot(d_fig, aes(x = N_enrol, y = mu_N, col = arm)) +
  geom_line(lwd = 0.2) +
  geom_point(size = 0.4) +
  scale_x_continuous("") +
  scale_y_continuous("Expected number of participants") +
  scale_color_discrete("Treatment arm: ") +
  ggh4x::facet_grid2(
    desc ~ paste0("domain: ", domain), 
    scales = "free_y", independent = "y") +
  theme_minimal() +
  theme(
    legend.position = "bottom",
    strip.text.y.right = element_text(angle = 0,
                                      hjust = 0,
                                      vjust = 0.2,
                                      size = 4),
    strip.text.x.top = element_text(size = 4),
    panel.grid.minor = element_blank(),
    panel.grid.major = element_line(color = "grey",
                                  linewidth = 0.1,
                                  linetype = 1),
    axis.title.y=element_text(size = 5),
    axis.text.y = element_text(size = 4), 
    axis.text.x = element_text(size = 4), 
    axis.title.x = element_text(size = 5),
    legend.text = element_text(size = 4)
  )
Figure 6: Expected number of participants on domain arms

Fraction of uncertainty resolved

Figure 9 shows the median value and 95% quantiles for the fraction of uncertainty resolved based on

\[ \begin{aligned} \text{Fraction \ Resolved} = 1 - \frac{Var(\beta_{post})}{Var(\beta_{pri})} \end{aligned} \]

where \(Var(\beta_{post})\) and \(Var(\beta_{pri})\) represent the variance associated with the prior and posterior belief for the relevant log-odds ratio for the treatment effects. This is basically just a way to compare the prior and posterior variance. When the posterior is based on negligible data, the variance will be similar to that of the prior and the fraction resolved will be very small. A low fraction resolved (e.g. less than 0.5) suggests that any decision that was made was done so with a substantial amount of uncertainty remaining (you didn’t move far from your prior belief) whereas values close to unity suggest that a lot of the uncertainty has been resolved.

  1. What is obvious from the above plots is also obvious here, the decision made in the AB duration domain are subject to a substantial amount of uncertainty.
Code
l_cfg <- copy(l[[1]]$cfg)

# initial uncertainty
v0 <- (l_cfg$pri$b_trt[2])^2

d_fig <- copy(d_tbl_3)
d_fig[, fr_unc := 1 - (se_lor/v0)]

d_fig <- d_fig[,
                 .(fr_unc = median(fr_unc),
                   q_025 = quantile(fr_unc, prob = 0.025),
                   q_975 = quantile(fr_unc, prob = 0.975)), 
                 keyby = .(scenario, desc, analys, domain, N)]
# setorderv(d_fig, cols = "scenario", order = -1L)
d_fig[, desc := factor(desc, levels = unique(d_fig$desc))]
d_fig[, domain := factor(domain, 
                         levels = 1:4, 
                         labels = c("Surgical", "AB Duration", "AB Ext-proph", "AB Choice"))]

# d_tbl_3[scenario == 8, range(lor)]

ggplot(data = d_fig,  
       aes(x = N, y = fr_unc)) +
  geom_line(lwd = 0.25) +
  geom_linerange(aes(ymin = q_025, ymax = q_975), lwd = 0.25) +
  scale_x_continuous("") +
  scale_y_continuous("Fraction of uncertainty resolved (1-(V_post/V_pri))") +
  facet_grid(desc ~ domain , 
             labeller = labeller(desc = label_wrap_gen(15)), scales = "free_y") +
  theme_minimal() +
  theme(text = element_text(size = 6),
        strip.text.y.right = element_text(angle = 0, size = 5),
        strip.text.x = element_text(angle = 0, size = 5),
        axis.text.x = element_text(angle = 45, vjust = 1, hjust=1, size = 5),
        axis.text.y = element_text(size = 5))
Figure 9: Median values and quantiles for fraction of uncertainty resolved